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ABSTRACT 

The generalizability of factor structures of student 
ratings of instruction based on instructors 1 individual differences 
was examined. The subjects were instructors from the humanities, 
social science, and mathematics divisions who had had their coirses 
evaluated at least twice using the same evaluation questionnaire. The 
data from the three divisions were factor analyzed and the resulting 
factor structures compared using Kaiser's procedure. Only one factor 
("defines student responsibilities") was found to be stable within 
the across divisions. The results were explained in terms of the 
distinction between within-class covariation and between-class 
covariation. (Author) 
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ABSTRACT 

Studied the generalizability of factors structures of student ratings 
of instruction based on instructors' individual differences. The subjects 
were instructors from the Humanities, Social Science and Science & Math 
division who had had their courses evaluated at least twice using the same 
evaluation questionnaire. The data from the three divisions were factor 
analyzed and the resulting factor structures compared using Kaiser's procedure. 
Only one factor ("defines student responsibilities") was found to be stable 
within and across divisions. The results were explained in terms of the 
distinction between within-class covariation and between-class covariation. 
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Generalizability of Factor Structures Underlying 
Student Ratings of Instruction^ 

Isaac I . Bejar and Kenneth 0. Doyle JrT 
University of Minnesota Measurement Services Center 

Researchers over the years have devoted considerahle effort to defining 
the underlying factor structure of students' ratings of instruction (e.g., 
Smalzreid and Remmers, 1943; Creager, 1950; Wherry, 1952; Crannell, 1953; 
Bendig, 1953, 1954; Coffman, 1954; Gibb, 1955; Isaacson, McKeachie, Milholland, 
Lin, Hofeller, Baerwaldt, and Zinn, 1964; Deshpande, Webb, and Marks, 1970; 
and Finkbeiner, Lathrop, and Schuerger, 1973). One outgrowth of these 
studies huts been the relatively common practice oP preparing concise data 
reports on the basis of a factor analysis so that an instructor , instead of 
receiving a printout of descriptive statistics for two dozen or more separate 
items, receives just four or five "factor scores" that summarize the ratings. 

Some efforts to refine the concept of reliability have raised questions 
about the legitimacy of such procedures, however. Cattell (1954) and Cronbach, 
Gleser, Nada, and Rajaratnam (1972) talk about the "consistency" or "general- 
izability" of scores over various conditions such as people, occasions, and 
items. This line of thinking can be applied as well to factor structures 
as to other "scores." If factor structures do not vary over people or time 
or other conditions, these factor structures are generalizable or consistent. 
But when different students rate different instructors in different courses, 
it seems curious that the factor structures should be the same. Since factor 
structures of ratings are essentially statements of which instructor characteristics 
the students perceive to eovary, one might expect these covariations to be 
instructor-specific. But if factor structures are instructor-specific — i.e., 
are not consistent or generalizable — then there would seem to be no rationale 
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for computing factor scores on the basis of any one analysis. Under such 

circumstances, either factor scores cannot be used at all, or separate factor 

analyses need to be performed for every different condition. 

At least two studies have directly addressed the. question of general- 

izability of factor structures of student ratings. Isaacson et al (1964), 

with ratings on a 46-item instrument from more than 1200 students in 33 

sections of a beginning course in psychology, used Kaiser's factor similarity 

technique (Kaiser, 1960; see also Kaiser, Hunka, Bianchini, 1971>* and found six factors 

to be consistent across sex of student and over two academic terms. Finkbeiner ,Lathrop , 

i 

and Schuerger (1973), with a 48-item instrument completed by almost 8,000 
students in some 650 classes, employed Schneewind and Cattell's procedure 
(1971) and found a five-factor solution obtained on the main campus of a 
state university to be highly congruent with one obtained at academic centers. 

From these studies it would appear that factor structures are generalizable 
over sex of student, academic term, and those characteristics that differentiate 
main-campus students from those at academic centers. 

However, there is a question about the units of analysis employed in 
these studies. In both cases the ratings given to various instructors 
were merged, and the factor analyses were performed on a correlation 
matrix based on these pooled data matrices. The effect of this procedure is 
to confound two independent sources of covariation, the between-ius true tors 
and the pooled within-class covariation. Within-class covariation is obtained 
by intercorrela t:i ng item responses, while be tween-ins true tors covariation is 
gotten by intercor relating item means. Because the calculation of means cancels 
out student individual differences (e.g., rater response tendencies; see 
Guilford, 1954, 278-89), bctween-instructors covariation would seem more descriptive 
of the instructors themselves. 
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The purpose of the present study, then, is to examine the gcneralizability 
of factor structures based on instructor individual differences. 

METHOD 

Rating Scale . The "Student Evaluation Form, Part I !t includes 26 rating 
items on 5-point Always to Never scales. An earlier factor analysis (Doyle 
and Liu, 1972) using both alpha and principal factors solutions 
had found four factors tentatively named "Student-orientedness," 
"Organization," "Instructor Presence" (i.e., clear and enthusiastic), and 
"Intellectual Expansiveness" (i.e., broadininded) . 

Sa mples . Students in all courses in the Humanities, Social Science s 
and Science/Mathematics divisions of the Morris campus of the University of 

Lvrnesoca during the Fail 1972 and Winter 1973 terms rated their instructors. 
These ratings were given in class toward the end of each term. The raters 
remained anonymous. From this pool of profiles, two samples were drawn, 
without replacement, in the following fashion. For the initial sample, an 
instructor's name was drawn at random. If he or she taught more than one 
course, freshman/sophomore courses were preferred to junior/senior ones. If 
two or more courses were still "eligible," one was selected at random. 
A repetition sample was chosen in similar fashion from the remaining profiles. 
Instructors who appeared in both samples were retained: 25 instructors in 
Humanities, 15 in Social Science, and 15 in Science/Mathematics. 

An alyse s . Item means were calculated for each class. Correlation 

matrices were computed on these means, one for each division in both the 

initial and the repetition sample. Factors were extracted by the method of 

principal axes, with iterations on the communalities • The largest off-diagonal 

4 

values were taken as initial estimates of communality. Retaining factors 
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with eigenvalues greater than 1.0, four-, five-, and six-factor solutions were 

found. Because five-factor solutions were most frequent, the analyses were 

repeated with the number of factors to be extracted set at five. These 

factors were rotated to a varimax criterion. Kaiser et. al. method (1970) 

was used to compare these factors across and within divisions. Since no 

sampling distribution for the similarity coefficient exists, .70 was taken as 

the minimum indication of factor similarity. For the intradivisional comparisons, 

the factor structure from the repetition sample was rotated to similarity 

with the structure from the initial sample; for the interdivisional comparisons, 

the first sample of each pair in Table 1 below was the target on criterion 

structure. 

RESULTS 

Comparison across academic divisions ♦ The five most similar pairs from 
each of the three interdivisional comparison are presented in Table 1, with 
factor similarity coefficients, indications of salient loadings, and fit. 
Salient loadings are those having a significant correlation at .05 with their 
factor (see Gorsuch, 1974). The fit for each set of comparisons is the 
average cf the cosines of the angles of the corresponding pairs of items 
(see Kaiser et al, 1970). One factor was very stable across all of the 



Insert Table 1 about here 



interdivisional comparisons, with similarity coefficients of .82, .90, and 
.93. This factor is defined by IT clearly indicates what material 
tests will cover," "clearly defines student responsibilities in the 
course," and "gives adequate information during the course regarding student 
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progress through quizzes, tests, or other feedback. r "Definition of Student 
Responsibility" seems a reasonably accurate name for this factor. One factor 
was stable across Humanities and Social Sciences, but not across the other 
comparisons. This appeared to be a "Broadmindedness" factor: "presents or 
allows various points of view," "welcomes criticisms from students," "invites 
criticisms of his own ideas," and "encourages class discussion." A rather 
difficult to name "Empa thy/Clarity/Stimuiation" factor was common to Humanities 
and Social Science: "Is concerned about the effectiveness of his 

teaching," "is genuinely interested in students," "welcomes questions from 
students," "is well informed on the materials presented," "clearly interprets 
abstract ideas and theories," "attempts to stimulate creative abilities," 
and "If; enthusiastic: about his subject." Finally, a weak "Course Coordination" 
factor seemed common to Social Science and Science/Mathematics; the two factors 
shared no salient items but were similar in overall pattern. The defining 

items were "keeps the course moving rapidly enough for the material," "makes 
it clear how each topic fits into the course," and "demands a reasonable amount 
of work." 

According to the Kaiser statistic, then, only one factor — "Definition 
of Student Responsibility" — is stable across all interdivisional comparisons, 
although three other factors are common to one or another pairing 'of divisions. 
Visual inspection of Table 1 suggests that these partially generalisjable 
factors (e.g., "Broadmindedness") do appear in other comparisons (e.g., in 
Science/Hath and Social Science) and that additional factors seem sometimes to 
emerge (e.g., a "Stimulation" factor for Humanities and Social Sciences), but 
in all these cases the nuclear items are sometimes related to some items, other 
times tu others, and generally quite difficult to interpret. 
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Comparis o ns within academic divisions . The results fur the intradivisional 
comparisons — factor similarity coefficients, indication of salient loadings 
and fit — are presented in Table 2. 



There seems to be a generally greater stability of factor structures within 
divisions than between, in that a total of 11 intradivisional factor pairs met 
the .70 similarity criterion, compared to 6 pairs between divisions. 
At the same time, there is considerable variability in the intradivisional 
results. All 5 Humanities pairs met the criterion (range /./5/ to / .92/) : 
five Social Science pairs were also very similar, but with a narrower range 
(/.71/ to /.7S/); only 1 Science/Math pair had a coefficient of .70 or 
greater (/.jG/ lu 1.151). Fit also tended to be slightly higher within than 
across divisions. 

The factor that was common across all divisions — "Definition of Student 
Responsibility" — seems clearly defined within each of the divisions too, 
although it failed to meet the similarity criterion in Science/Mathematics. An 
"Empathy/Clarity" or "Presence" factor was replicable for Humanities, a.° were 
ones describing "Broadmindedness" and "Stimulation." A fifth Humanities 
factor is difficult to interpret; it seems to be a second version of instructor 
presence. The "Empathy/Clarity" factor from the Humanities appeared in but 
was not replicable for Science/Math, and was quite different in Social Science, 
where a factor portraying clarity appeared in its own right. Conversely, a 
factor with items describing broadmindedness, empathy, and stimulation was 
replicable in Social Science, while one including broadmindedness and stimulation, 
but not empathy, was stable within Science/Mathematics. In short, the factors are 
for the most part specific to each academic division and even within divisions 
are not very generalizable. 
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DISCUSSION 

Soma comment about the numbei*s of instructors used in this study would be 
appropriate. A rule of thumb is not to factor analyze when the number of 
cases (instructors) is not five or ten times the number of variables (items), 
and especially not to factor analyze singular matrices (where the number of 
variables exceeds the number of cases). However, Rummel (1970, p. 220) 
points out that factor analysis of singular matrices allows d escriptions of 
data variability, even though inference from sample results to universal 
factors is limited. Having more variables than cases imposes a necessary 
dependence on the interrelationships that can bias the inferences that could 
otherwise be drawn* The present study analyzes 26 variables for 15 and 25 
car>e c i, which would allow up to 15 (or 25) independent factors to emerge and 
that would certainly allow the major patterns of relationships to appear. 
Further support for this factor analysis of singular matrices comes from a 
computer-simulation study (Bejar & Doyle, in preparation) in which 25 variables 
were factor analyzed for 26, 20, 15, and 10 cases. Kaiser's factor-comparison 
procedure (1970) found a 5-f actor solution (off-diagonal initial communalities, 
varimax rotation) recoverable even for the 10 cases. But in any case a 
replication of the present study would help determine its generalizability. 

The question arises of why the present study found differential and limited 
factor generalizability while previous investigations (e.g., Isaacson et. al . , 
1964; Finkbeiner et al, 1973) found consistent and very high factor similarity* 
Some elaboration of the differences among within-class, between-ins tructors , 
and total covariance matrices may help resolve the apparent divergence of 
results, since the earlier studies analysed total-variance matrices while the 
present one analyzed only the between-ins tructors matrix. 

The sum of the variance-covariance matrices computed on each class 
weighted by their respective degrees of freedom (number of students in each 
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minus 1) constitutes the pooled within class sum of squares and cross products 
matrix (W) „ The variance-covariance matrix using class means (adjusting for 
class size) weighted by its degrees of freedom (number of classes minus 1) 
constitutes the between-ins true tor sum of square and cross products matrix (B) . 
The sum of these two matrices equals the total sum of squares and cross products 
matrix, i.e., 



By appropriate rescaling, each of these 3 matrices can he converted to a 
correlation matrix and factor analyzed. Thus a factor analysis based on T 
reflects both within-class and between-iru,tructor covariation. Whether W or B 
is more similar to T cannot be prruicted beforehand. However if a) the 
d?;.:*ee.:-: of freedom for the ...thin-class source arc^ large In relation to the 
betwe^ n-ins true tor p ' urce and/or b) the within-class covariation is larger 
than the between- ins true tor , then T is more similar to W than to B, To the 
extent that these two conditions were fulfilled in the Isaacson and Finkbeiner 
studies — and at least condition a) was — the similarities found were from 
the within-class source. In the present study, only the between-ins tructor 
source was analyzed. Hence the apparently discrepant findings are the result of 
analyzing different sources of covariation. 

The principal difference between within-class and between-instructors 
covariation lies in thu computational treatment of student individual differences. 
Within classes, students' deviations from class means are treated as true 
variance. Between instructors, student differences are treated as error and 
are 'averaged out 1 by computation of the class means. Between-instructors 
data, then, more nearly describe instructor individual differences, while 
within-class data describe student differences. Thus halo effect and similar 
rater tendencies are more likely to be diminished in the betweeii-instruc tors 
matrix, and so that matrix would seem to be the preferred one for many studies. 



T = 



B + W 
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But cancelling out student differences by computing means removes not only 
some response tendencies but reliable statements of differential student/ 
instructor interactions as well. So the portrayal of instructor individual 
differences by between-instructors data is accomplished at the cost of 
information about these interactions. The extent of this loss depends on the 
reliability of individual students 1 ratings and on the homogeneity of the 
class; the more reliable the ratings and the more heterogeneous the class, 
the greater is the loss of information about differential effectiveness. 
More research attention needs to be given to within-class data in general, 
but particularly to ways of increasing the reliability of individual student T s 
ratings and to identifying patterns of differential student/instructor 
^relationships - Similarly, the between- instructors matrix may provide a 
fruitful area of study, especially for the identification of effective 
instructional practices and for the validation of student ratings. It is 
suggested that the total variance matrix, because it confounds the between 
and within components of variance, be banished forever from the literature. 
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FOOTNOTES 

1. Request for reprints should be sent to Isaac I. Bejar, Measurement Services 
Center, 9 Clarence Avenue SE, Minneapolis 3 Minnesota 55414. 

2. The authors are indebted to Dr. Susan Whitely for comments on an earlier 
version of this paper. 

3. The authors are indebted to Dr. H.F. Kaiser for supplying a listing of 
the program for doing the factor comparisons. Veldman (1967) also lists 
a similar program (RELATE) although it is just for the case when two 
orthogonal factor matrices are compared. 

4. The available computer programs cannot compute R as the estimate 
of communality when the correlation matrix is singular because the 
matrix cannot be inverted. However, Tucker and (1973) have 
provided a more general procedure which can be used with singular 
correlation matrices. 
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